ABSTRACT
Gene is a basic component of DNA located in the nucleus of Human cell. Currently data mining technique has
huge impact in fields of human genetic science and gene sequence data analysis. Gene sequence analysis is a
method of subjecting DNA sequence to systematic methods in order to know the genes character, configuration,
nature and characteristics. CBC and MNBC applied to gene sequence data analysis, aims to segregate diseased
diabetic genes from a vast stream of DNA gene sequence elements present in group of copious statistical data.
This techniques attempts to approve, determine methods and tools for analyzing diseased gene sequences. It also
helps in classification and interpretation of results accurately and meaningfully. This study is a combination of
supervised and unsupervised machine learning technique for data analysis. The clustering is done by CBC
whereas classification done by MNBC techniques. It recognizes gene expressions by framing association rules in
accordance with support measure and confidence measure on the input data set.It will extract and filter required
data into clusters based on CBC technique thereby drafting association rules. These are then applied on testing
dataset to filter required (diseased) gene sequences. Finally MLRC algorithm is applied as classification algorithm
to identify class labels of test genes sequences in a big dataset. In medical diagnosis gene data mining techniques
through gene discretization models helps to identify various associations between the DNA genes based
progressions and inconsistency in disease infections transformations. Above all it overcomes the limitation of
existing Support Vector Machine Classification technology which incurs high computational cost and increased
iterations
Keywords: - Data mining, Data Analysis, DNA Gene, Gene Sequence, Vector Machine Classification